
This dataset was scraped from nextspaceflight.com and includes all the space missions since the beginning of Space Race between the USA and the Soviet Union in 1957!
%pip install iso3166
Defaulting to user installation because normal site-packages is not writeable Requirement already satisfied: iso3166 in /home/mitresh/.local/lib/python3.10/site-packages (2.1.1) Note: you may need to restart the kernel to use updated packages.
Run the cell below if you are working with Google Colab.
%pip install --upgrade plotly
Defaulting to user installation because normal site-packages is not writeable Requirement already satisfied: plotly in /home/mitresh/.local/lib/python3.10/site-packages (5.15.0) Requirement already satisfied: tenacity>=6.2.0 in /home/mitresh/.local/lib/python3.10/site-packages (from plotly) (8.2.2) Requirement already satisfied: packaging in /home/mitresh/.local/lib/python3.10/site-packages (from plotly) (23.1) Note: you may need to restart the kernel to use updated packages.
import numpy as np
import pandas as pd
import plotly.express as px
import matplotlib.pyplot as plt
import seaborn as sns
# These might be helpful:
from iso3166 import countries
from datetime import datetime, timedelta
pd.options.display.float_format = '{:,.2f}'.format
df_data = pd.read_csv('mission_launches.csv')
df_data?df_data.head()
| Unnamed: 0.1 | Unnamed: 0 | Organisation | Location | Date | Detail | Rocket_Status | Price | Mission_Status | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | 0 | 0 | SpaceX | LC-39A, Kennedy Space Center, Florida, USA | Fri Aug 07, 2020 05:12 UTC | Falcon 9 Block 5 | Starlink V1 L9 & BlackSky | StatusActive | 50.0 | Success |
| 1 | 1 | 1 | CASC | Site 9401 (SLS-2), Jiuquan Satellite Launch Ce... | Thu Aug 06, 2020 04:01 UTC | Long March 2D | Gaofen-9 04 & Q-SAT | StatusActive | 29.75 | Success |
| 2 | 2 | 2 | SpaceX | Pad A, Boca Chica, Texas, USA | Tue Aug 04, 2020 23:57 UTC | Starship Prototype | 150 Meter Hop | StatusActive | NaN | Success |
| 3 | 3 | 3 | Roscosmos | Site 200/39, Baikonur Cosmodrome, Kazakhstan | Thu Jul 30, 2020 21:25 UTC | Proton-M/Briz-M | Ekspress-80 & Ekspress-103 | StatusActive | 65.0 | Success |
| 4 | 4 | 4 | ULA | SLC-41, Cape Canaveral AFS, Florida, USA | Thu Jul 30, 2020 11:50 UTC | Atlas V 541 | Perseverance | StatusActive | 145.0 | Success |
df_data.shape
(4324, 9)
df_data.columns
Index(['Unnamed: 0.1', 'Unnamed: 0', 'Organisation', 'Location', 'Date',
'Detail', 'Rocket_Status', 'Price', 'Mission_Status'],
dtype='object')
df_data.isna().values.any()
True
Consider removing columns containing junk data.
columns_to_drop = ['Unnamed: 0.1','Unnamed: 0']
df = df_data.drop(columns_to_drop, axis=1)
df.head()
| Organisation | Location | Date | Detail | Rocket_Status | Price | Mission_Status | |
|---|---|---|---|---|---|---|---|
| 0 | SpaceX | LC-39A, Kennedy Space Center, Florida, USA | Fri Aug 07, 2020 05:12 UTC | Falcon 9 Block 5 | Starlink V1 L9 & BlackSky | StatusActive | 50.0 | Success |
| 1 | CASC | Site 9401 (SLS-2), Jiuquan Satellite Launch Ce... | Thu Aug 06, 2020 04:01 UTC | Long March 2D | Gaofen-9 04 & Q-SAT | StatusActive | 29.75 | Success |
| 2 | SpaceX | Pad A, Boca Chica, Texas, USA | Tue Aug 04, 2020 23:57 UTC | Starship Prototype | 150 Meter Hop | StatusActive | NaN | Success |
| 3 | Roscosmos | Site 200/39, Baikonur Cosmodrome, Kazakhstan | Thu Jul 30, 2020 21:25 UTC | Proton-M/Briz-M | Ekspress-80 & Ekspress-103 | StatusActive | 65.0 | Success |
| 4 | ULA | SLC-41, Cape Canaveral AFS, Florida, USA | Thu Jul 30, 2020 11:50 UTC | Atlas V 541 | Perseverance | StatusActive | 145.0 | Success |
df.describe()
| Organisation | Location | Date | Detail | Rocket_Status | Price | Mission_Status | |
|---|---|---|---|---|---|---|---|
| count | 4324 | 4324 | 4324 | 4324 | 4324 | 964 | 4324 |
| unique | 56 | 137 | 4319 | 4278 | 2 | 56 | 4 |
| top | RVSN USSR | Site 31/6, Baikonur Cosmodrome, Kazakhstan | Wed Nov 05, 2008 00:15 UTC | Cosmos-3MRB (65MRB) | BOR-5 Shuttle | StatusRetired | 450.0 | Success |
| freq | 1777 | 235 | 2 | 6 | 3534 | 136 | 3879 |
Create a chart that shows the number of space mission launches by organisation.
launch_count = df.Organisation.value_counts()
launch_count = launch_count.sort_values(ascending=False)
plt.figure(figsize=(20, 12))
plt.barh(launch_count.index, launch_count.values)
plt.ylabel('Organisation', fontsize=5)
plt.xlabel('Number of Space Mission Launches')
plt.title('Number of Space Mission Launches by Organisation')
plt.show()
How many rockets are active compared to those that are decomissioned?
active_rockets = df[df['Rocket_Status'] == 'StatusActive'].shape[0]
active_rockets = df['Rocket_Status'].value_counts()
plt.barh(active_rockets.index, active_rockets.values)
plt.ylabel('Rocket Status', fontsize=12)
plt.xlabel('Number of Rockets')
plt.title('Number of Active versus Retired Rockets')
plt.show()
How many missions were successful? How many missions failed?
mission_status = df[df['Mission_Status'] == 'Success'].shape[0]
mission_status = df['Mission_Status'].value_counts()
plt.barh(mission_status.index, mission_status.values)
plt.ylabel('Mission Status', fontsize=12)
plt.xlabel('Number of Missions')
plt.title('Distribution of Mission Status')
plt.show()
Create a histogram and visualise the distribution. The price column is given in USD millions (careful of missing values).
df_clean = df.dropna(subset=['Price'])
df_clean.Price.isna().values.any()
False
plt.figure(figsize=(8, 6))
plt.hist(df_clean["Price"], bins=10, edgecolor='black')
plt.xlabel('Price (USD millions)')
plt.xticks(rotation=90)
plt.ylabel('Frequency')
plt.title('Distribution of Prices')
plt.show()
matter on this map.country feature as well as change the country names that no longer exist.Wrangle the Country Names
You'll need to use a 3 letter country code for each country. You might have to change some country names.
You can use the iso3166 package to convert the country names to Alpha3 format.
df['Country'] = df['Location'].str.split(',').str[-1].str.strip()
df_sorted = df.sort_values('Country')
fig = px.choropleth(df_sorted, locations='Country', locationmode='country names', color='Mission_Status',
projection='natural earth')
fig.update_layout(title_text='Choropleth Map', title_x=0.5)
fig = px.sunburst(df, path=['Location', 'Organisation', 'Mission_Status'])
fig.update_layout(title_text='Sunburst Chart - Countries, Organisations, and Mission Status')
fig.show()
total_spending = df.groupby('Organisation')['Price'].sum()
print(total_spending)
Organisation AEB 0 AMBA 0 ASI 0 Arianespace 48.5200.048.5200.0200.0200.037.0200.037.0200.0... Arm??e de l'Air 0 Blue Origin 0 Boeing 133.0164.0164.0350.0133.0133.0164.0 CASC 29.7564.6829.1529.7564.6829.1529.7530.829.755.... CASIC 0 CECLES 0 CNES 0 Douglas 0 EER 20.0 ESA 37.0 Eurockot 41.841.841.841.841.841.841.841.841.841.841.841... ExPace 28.3 Exos 0 General Dynamics 0 IAI 0 ILS 65.065.065.065.065.0115.0153.0109.0130.0135.01... IRGC 0 ISA 0 ISAS 0 ISRO 21.031.062.021.021.047.021.062.021.031.047.031... JAXA 90.039.039.0 KARI 0 KCST 0 Khrunichev 0 Kosmotras 29.029.029.029.029.029.029.029.029.029.029.029... Land Launch 0 Landspace 0 Lockheed 35.035.035.035.035.035.035.035.0 MHI 112.590.0112.590.0112.590.090.090.090.090.090.... MITT 0 Martin Marietta 35.035.0136.635.0136.6136.6136.635.035.0 NASA 450.0450.0450.0450.0450.0450.0450.0450.0450.04... Northrop 46.085.085.040.085.085.085.085.045.046.040.085... OKB-586 0 OneSpace 0 RAE 0 RVSN USSR 5,000.05,000.0 Rocket Lab 7.57.57.57.57.57.57.57.57.57.57.57.57.5 Roscosmos 65.048.548.548.565.048.548.565.048.565.065.048... SRC 0 Sandia 15.0 Sea Launch 0 SpaceX 50.050.050.050.050.050.050.050.050.050.050.050... Starsem 0 ULA 145.0120.0153.0115.0164.0153.0350.0153.0350.01... US Air Force 59.059.059.059.059.059.059.059.059.059.059.059... US Navy 0 UT 0 VKS RF 48.541.848.548.541.848.565.041.848.565.035.065... Virgin Orbit 12.0 Yuzhmash 0 i-Space 0 Name: Price, dtype: object
money_spent = df_data[df_data["Price"].notna()].copy()
money_spent["Price"] = money_spent["Price"].str.replace(',', '').astype(float)
organisation_expense = money_spent.groupby("Organisation")["Price"].mean().reset_index()
organisation_expense.sort_values("Price", ascending=False)
organisation_expense.head()
| Organisation | Price | |
|---|---|---|
| 0 | Arianespace | 170.26 |
| 1 | Boeing | 177.29 |
| 2 | CASC | 40.13 |
| 3 | EER | 20.00 |
| 4 | ESA | 37.00 |
import datetime
def extract_year(date_str):
try:
dt = datetime.datetime.strptime(date_str, "%a %b %d, %Y %H:%M %Z")
except ValueError:
dt = datetime.datetime.strptime(date_str, "%a %b %d, %Y")
return dt.year
df['Year'] = df['Date'].apply(extract_year)
ds = df['Year'].value_counts().reset_index()
ds.columns = [
'Year',
'Count'
]
fig = px.bar(
ds,
x='Year',
y="Count",
orientation='v',
title='Number Of launches Per Year'
)
fig.show()
Which month has seen the highest number of launches in all time? Superimpose a rolling average on the month on month time series chart.
def extract_year(date_str):
try:
dt = datetime.datetime.strptime(date_str, "%a %b %d, %Y %H:%M %Z")
except ValueError:
dt = datetime.datetime.strptime(date_str, "%a %b %d, %Y")
return dt.month
df['Month'] = df['Date'].apply(extract_year)
month_on_month = df['Month'].value_counts().reset_index()
month_on_month.columns = [
'Month',
'Count'
]
fig = px.bar(
month_on_month,
x='Month',
y="Count",
orientation='v',
title='Sum of total missions in each Month',
color='Count'
)
fig.show()
Some months have better weather than others. Which time of year seems to be best for space missions?
month_on_month.max()
Month 12 Count 450 dtype: int64
month_on_month.min()
Month 1 Count 268 dtype: int64
Create a line chart that shows the average price of rocket launches over time.
df_price = df.sort_values(by='Price', ascending=False)
fig = px.line(df_price, x='Price', y='Year', title='Sales Trend')
fig.show()
avg_price = df[df["Price"].notna()]
pd.options.mode.chained_assignment = None
avg_price["Price"] = avg_price["Price"].str.replace(',', '').astype(float)
avg_price.drop(columns=['Detail','year', 'Location', 'Organisation', 'Country','Date','Rocket_Status','Mission_Status'], inplace=True)
avg_price.head()
| Price | Year | Month | |
|---|---|---|---|
| 0 | 50.00 | 2020 | 8 |
| 1 | 29.75 | 2020 | 8 |
| 3 | 65.00 | 2020 | 7 |
| 4 | 145.00 | 2020 | 7 |
| 5 | 64.68 | 2020 | 7 |
avg_price.groupby("Year").mean().plot(figsize=(12, 8))
<Axes: xlabel='Year'>
groups = avg_price.groupby("Year")
fig, ax = plt.subplots(figsize=(12, 8))
for year, group in groups:
group.plot(x="Date", y="Price", ax=ax, label=year)
ax.set_xlabel("Date")
ax.set_ylabel("Price")
ax.set_title("Average Price by Year")
ax.legend()
plt.show()
How has the dominance of launches changed over time between the different players?
df.head()
| Organisation | Location | Date | Detail | Rocket_Status | Price | Mission_Status | Country | Year | Month | year | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | SpaceX | LC-39A, Kennedy Space Center, Florida, USA | Fri Aug 07, 2020 05:12 UTC | Falcon 9 Block 5 | Starlink V1 L9 & BlackSky | StatusActive | 50.0 | Success | USA | 2020 | Aug | 2020 |
| 1 | CASC | Site 9401 (SLS-2), Jiuquan Satellite Launch Ce... | Thu Aug 06, 2020 04:01 UTC | Long March 2D | Gaofen-9 04 & Q-SAT | StatusActive | 29.75 | Success | China | 2020 | Aug | 2020 |
| 2 | SpaceX | Pad A, Boca Chica, Texas, USA | Tue Aug 04, 2020 23:57 UTC | Starship Prototype | 150 Meter Hop | StatusActive | NaN | Success | USA | 2020 | Aug | 2020 |
| 3 | Roscosmos | Site 200/39, Baikonur Cosmodrome, Kazakhstan | Thu Jul 30, 2020 21:25 UTC | Proton-M/Briz-M | Ekspress-80 & Ekspress-103 | StatusActive | 65.0 | Success | Kazakhstan | 2020 | Jul | 2020 |
| 4 | ULA | SLC-41, Cape Canaveral AFS, Florida, USA | Thu Jul 30, 2020 11:50 UTC | Atlas V 541 | Perseverance | StatusActive | 145.0 | Success | USA | 2020 | Jul | 2020 |
groups = avg_price.groupby("Year")
fig, ax = plt.subplots(figsize=(12, 8))
# Initialize the color palette
colors = plt.cm.tab10.colors
for i, (year, group) in enumerate(groups):
# Set the alpha value based on whether the year is selected or not
alpha = 1.0 if i == 0 else 0.3
# Plot the data for the year with the specified color and alpha value
group.plot(x="Date", y="Price", ax=ax, label=year, color=colors[i%10], alpha=alpha)
# Set labels and title
ax.set_xlabel("Date")
ax.set_ylabel("Price")
ax.set_title("Average Price by Year")
# Show the legend
ax.legend()
# Function to update the alpha values on click
def onclick(event):
# Get the index of the clicked line
index = event.ind[0]
# Update the alpha values of all lines
for i, line in enumerate(ax.lines):
line.set_alpha(1.0 if i == index else 0.3)
# Redraw the figure
fig.canvas.draw()
# Connect the onclick event to the figure
fig.canvas.mpl_connect('pick_event', onclick)
# Display the plot
plt.show()
top_10_org = pd.DataFrame(columns=df.columns)
for val in df.groupby("Organisation").count().sort_values("Date",ascending=False)[:10].index:
print(val)
org = df[df.Organisation == val]
top_10_org = pd.concat([top_10, org], ignore_index=True)
top_10_org
RVSN USSR Arianespace General Dynamics CASC NASA VKS RF US Air Force ULA Boeing Martin Marietta
| Organisation | Location | Date | Detail | Rocket_Status | Price | Mission_Status | Country | Year | Month | year | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | RVSN USSR | Site 41/1, Plesetsk Cosmodrome, Russia | Wed Aug 12, 1998 12:53 UTC | Molniya-M /Block ML | Molniya-1 n†133 | StatusRetired | NaN | Success | Russia | 1998 | 8 | 1998 |
| 1 | RVSN USSR | Site 43/3, Plesetsk Cosmodrome, Russia | Thu Aug 29, 1996 05:22 UTC | Molniya-M /Block SO-L | Interbol 2, Magion5 & ... | StatusRetired | NaN | Success | Russia | 1996 | 8 | 1996 |
| 2 | RVSN USSR | Site 43/3, Plesetsk Cosmodrome, Russia | Wed Aug 02, 1995 23:59 UTC | Molniya-M /Block SO-L | Interbol 1 & Magion 4 | StatusRetired | NaN | Success | Russia | 1995 | 8 | 1995 |
| 3 | RVSN USSR | Site 32/1, Plesetsk Cosmodrome, Russia | Mon Jul 13, 1992 17:41 UTC | Tsyklon-3 | Cosmos 2197 to 2202 | StatusRetired | NaN | Success | Russia | 1992 | 7 | 1992 |
| 4 | RVSN USSR | Site 43/3, Plesetsk Cosmodrome, Russia | Wed Jul 08, 1992 09:53 UTC | Molniya-M /Block 2BL | Cosmos 2196 | StatusRetired | NaN | Success | Russia | 1992 | 7 | 1992 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 3622 | Martin Marietta | SLC-41, Cape Canaveral AFS, Florida, USA | Fri Aug 26, 1966 13:59 UTC | Titan IIIC | IDCSP-1 8-14, GGTS-2 | StatusRetired | NaN | Failure | USA | 1966 | 8 | 1966 |
| 3623 | Martin Marietta | SLC-41, Cape Canaveral AFS, Florida, USA | Thu Jun 16, 1966 14:00 UTC | Titan IIIC | OPS 9311-9317 & GGTS-1 | StatusRetired | NaN | Success | USA | 1966 | 6 | 1966 |
| 3624 | Martin Marietta | SLC-41, Cape Canaveral AFS, Florida, USA | Tue Dec 21, 1965 14:00 UTC | Titan IIIC | LES 3 & 4, OV2-3, OSCAR-4 | StatusRetired | NaN | Partial Failure | USA | 1965 | 12 | 1965 |
| 3625 | Martin Marietta | SLC-40, Cape Canaveral AFS, Florida, USA | Fri Oct 15, 1965 17:23 UTC | Titan IIIC | LCS-2 & OV2-1 | StatusRetired | NaN | Failure | USA | 1965 | 10 | 1965 |
| 3626 | Martin Marietta | SLC-40, Cape Canaveral AFS, Florida, USA | Fri Jun 18, 1965 14:00 UTC | Titan IIIC | Transtage 5 | StatusRetired | NaN | Success | USA | 1965 | 6 | 1965 |
3627 rows × 11 columns
def extract_year(date_str):
try:
dt = datetime.datetime.strptime(date_str, "%a %b %d, %Y %H:%M %Z")
except ValueError:
dt = datetime.datetime.strptime(date_str, "%a %b %d, %Y")
return dt.decade
df[df.Organisation=="CASC"]
top_10_org.groupby("Organisation").count().sort_values("Date",ascending=False)[:10].index
px.histogram(top_10_org.sort_values(by=["Organisation", "Date"], ascending=[True, False]),
x="Organisation",
color='Organisation',
nbins=10)
The cold war lasted from the start of the dataset up until 1991.
df['Country'].unique()
countries_to_replace = ['Kazakhstan', 'Russia']
df.loc[df['Country'].isin(countries_to_replace), 'Country'] = 'Russia'
0 USA
1 China
2 USA
3 Russia
4 USA
...
4319 USA
4320 USA
4321 USA
4322 Russia
4323 Russia
Name: Country, Length: 4324, dtype: object
CW_df = df[(df['Country']=='USA') | (df['Country']=='Russia')]
war = CW_df.sort_values("Year")
war[(war.Year <= 1991)]
| Organisation | Location | Date | Detail | Rocket_Status | Price | Mission_Status | Country | Year | Month | year | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 4323 | RVSN USSR | Site 1/5, Baikonur Cosmodrome, Kazakhstan | Fri Oct 04, 1957 19:28 UTC | Sputnik 8K71PS | Sputnik-1 | StatusRetired | NaN | Success | Russia | 1957 | 10 | 1957 |
| 4322 | RVSN USSR | Site 1/5, Baikonur Cosmodrome, Kazakhstan | Sun Nov 03, 1957 02:30 UTC | Sputnik 8K71PS | Sputnik-2 | StatusRetired | NaN | Success | Russia | 1957 | 11 | 1957 |
| 4321 | US Navy | LC-18A, Cape Canaveral AFS, Florida, USA | Fri Dec 06, 1957 16:44 UTC | Vanguard | Vanguard TV3 | StatusRetired | NaN | Failure | USA | 1957 | 12 | 1957 |
| 4293 | US Air Force | LC-11, Cape Canaveral AFS, Florida, USA | Thu Dec 18, 1958 23:02 UTC | SM-65B Atlas | SCORE | StatusRetired | NaN | Success | USA | 1958 | 12 | 1958 |
| 4294 | AMBA | LC-5, Cape Canaveral AFS, Florida, USA | Sat Dec 06, 1958 05:44 UTC | Juno II | Pioneer 3 | StatusRetired | NaN | Partial Failure | USA | 1958 | 12 | 1958 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 1755 | NASA | LC-39A, Kennedy Space Center, Florida, USA | Sun Apr 28, 1991 11:33 UTC | Space Shuttle Discovery | STS-39 | StatusRetired | 450.0 | Success | USA | 1991 | 4 | 1991 |
| 1754 | General Dynamics | SLC-3W, Vandenberg AFB, California, USA | Tue May 14, 1991 15:52 UTC | Atlas-E/F Star-37S-ISS | NOAA-D | StatusRetired | NaN | Success | USA | 1991 | 5 | 1991 |
| 1753 | RVSN USSR | Site 32/2, Plesetsk Cosmodrome, Russia | Thu May 16, 1991 21:40 UTC | Tsyklon-3 | Cosmos 2143 to 2148 | StatusRetired | NaN | Success | Russia | 1991 | 5 | 1991 |
| 1762 | RVSN USSR | Site 43/3, Plesetsk Cosmodrome, Russia | Fri Mar 22, 1991 12:19 UTC | Molniya-M /Block ML | Molniya-3 n†148 | StatusRetired | NaN | Success | Russia | 1991 | 3 | 1991 |
| 1751 | RVSN USSR | Site 32/2, Plesetsk Cosmodrome, Russia | Tue Jun 04, 1991 09:00 UTC | Tsyklon-3 | Okean 3 | StatusRetired | NaN | Success | Russia | 1991 | 6 | 1991 |
2432 rows × 11 columns
Hint: Remember to include former Soviet Republics like Kazakhstan when analysing the total number of launches.
CW_df["Country"].value_counts().rename_axis("Country").reset_index(name='counts')
| Country | counts | |
|---|---|---|
| 0 | Russia | 2096 |
| 1 | USA | 1344 |
colors = ["purple", "orange"]
grouping = CW_df.groupby("Country").count().reset_index()
sizes = grouping['Mission_Status']
labels = grouping['Country']
plt.pie(sizes, labels = labels, colors = colors)
([<matplotlib.patches.Wedge at 0x7f5194b95150>, <matplotlib.patches.Wedge at 0x7f5194b95960>], [Text(-0.37034241843976107, 1.0357830337981933, 'Russia'), Text(0.3703425154167657, -1.035782999124229, 'USA')])
CW_df.groupby(["year", "Country"]).size().unstack().plot()
<Axes: xlabel='year'>
mission_failures = CW_df[CW_df['Mission_Status'] == 'Failure']
failures_yearly = mission_failures.groupby('Year').size()
plt.figure(figsize=(12, 8))
failures_yearly.plot(kind='bar', color='red')
plt.xlabel('Year')
plt.ylabel('Number of Mission Failures')
plt.title('Total Number of Mission Failures Year on Year (Bar Plot)')
plt.show()
plt.figure(figsize=(12, 8))
failures_yearly.plot(kind='line', marker='o', color='blue')
plt.xlabel('Year')
plt.ylabel('Number of Mission Failures')
plt.title('Total Number of Mission Failures Year on Year (Line Plot)')
plt.grid(True)
plt.show()
Did failures go up or down over time? Did the countries get better at minimising risk and improving their chances of success over time?
total_missions = CW_df.groupby('year').size()
failures = CW_df[CW_df['Mission_Status'] == 'Failure'].groupby('year').size()
failure_percentage = (failures / total_missions) * 100
failure_percentage.plot(kind='line', figsize=(10, 6))
plt.title('Percentage of Failures over Time')
plt.xlabel('Year')
plt.ylabel('Failure Percentage')
plt.show()
Do the results change if we only look at the number of successful launches?
df.Country.nunique()
21
df.head()
| Organisation | Location | Date | Detail | Rocket_Status | Price | Mission_Status | Country | Year | Month | year | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | SpaceX | LC-39A, Kennedy Space Center, Florida, USA | Fri Aug 07, 2020 05:12 UTC | Falcon 9 Block 5 | Starlink V1 L9 & BlackSky | StatusActive | 50.0 | Success | USA | 2020 | 8 | 2020 |
| 1 | CASC | Site 9401 (SLS-2), Jiuquan Satellite Launch Ce... | Thu Aug 06, 2020 04:01 UTC | Long March 2D | Gaofen-9 04 & Q-SAT | StatusActive | 29.75 | Success | China | 2020 | 8 | 2020 |
| 2 | SpaceX | Pad A, Boca Chica, Texas, USA | Tue Aug 04, 2020 23:57 UTC | Starship Prototype | 150 Meter Hop | StatusActive | NaN | Success | USA | 2020 | 8 | 2020 |
| 3 | Roscosmos | Site 200/39, Baikonur Cosmodrome, Kazakhstan | Thu Jul 30, 2020 21:25 UTC | Proton-M/Briz-M | Ekspress-80 & Ekspress-103 | StatusActive | 65.0 | Success | Russia | 2020 | 7 | 2020 |
| 4 | ULA | SLC-41, Cape Canaveral AFS, Florida, USA | Thu Jul 30, 2020 11:50 UTC | Atlas V 541 | Perseverance | StatusActive | 145.0 | Success | USA | 2020 | 7 | 2020 |
top_countries = []
for year in df['Year'].unique():
year_data = df[df['Year'] == year]
top_country = year_data['Country'].value_counts().idxmax()
top_countries.append((year, top_country))
top_countries_df = pd.DataFrame(top_countries, columns=['Year', 'Top Country'])
plt.figure(figsize=(10, 6))
plt.bar(top_countries_df['Year'], top_countries_df['Top Country'])
plt.xlabel('Year')
plt.ylabel('Top Country')
plt.title('Top Country with Most Launches Each Year')
plt.xticks(rotation=90)
plt.show()
Which organisation was dominant in the 1970s and 1980s? Which organisation was dominant in 2018, 2019 and 2020?
org_launches = df.groupby("year")["Organisation"].value_counts().rename_axis(["year", "Organisation"]).reset_index(name='counts')
org_launches.loc[org_launches.groupby("year")["counts"].idxmax()]
org_launches.head()
| year | Organisation | counts | |
|---|---|---|---|
| 0 | 1957 | RVSN USSR | 2 |
| 1 | 1957 | US Navy | 1 |
| 2 | 1958 | US Navy | 12 |
| 3 | 1958 | AMBA | 7 |
| 4 | 1958 | RVSN USSR | 5 |
org_set = set(org_launches['Organisation'])
plt.figure(figsize=(12, 10), dpi=80)
for org in org_set:
selected_data = org_launches.loc[org_launches['Organisation'] == org]
plt.plot(selected_data['year'], selected_data['counts'], label=org)
plt.legend(loc='upper center', bbox_to_anchor=(0.5, -0.05),
fancybox=True, shadow=True, ncol=6)
plt.show()